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THE APPLICATION IDENTIFIED ABOVE HAS BEEN EXAMINED AND IS ALLOWED FOR ISSUANCE AS A PATENT. 
PROSECUTION ON THE MERITS IS CLOSED . THIS NOTICE OF ALLOWANCE IS NOT A GRANT OF PATENT RIGHTS. 
THIS APPLICATION IS SUBJECT TO WITHDRAWAL FROM ISSUE AT THE INITIATIVE OF THE OFFICE OR UPON 
PETITION BY THE APPLICANT. SEE 37 CFR 1.313 AND MPEP 1308. 

THE ISSUE FEE AND PUBLICATION FEE (IF REQUIRED) MUST BE PAID WITHIN THREE MONTHS FROM THE 
MAILING DATE OF THIS NOTICE OR THIS APPLICATION SHALL BE REGARDED AS ABANDONED. THIS 
STATUTORY PERIOD CANNOT BE EXTENDED . SEE 35 U.S.C. 151. THE ISSUE FEE DUE INDICATED ABOVE 
REFLECTS A CREDIT FOR ANY PREVIOUSLY PAID ISSUE FEE APPLIED IN THIS APPLICATION. THE PTOL-85B (OR 
AN EQUIVALENT) MUST BE RETURNED WITHIN THIS PERIOD EVEN IF NO FEE IS DUE OR THE APPLICATION WILL 
BE REGARDED AS ABANDONED. 

HOW TO REPLY TO THIS NOTICE: 



I. Review the SMALL ENTITY status shown above. 

If the SMALL ENTITY is shown as YES, verify your current 
SMALL ENTITY status: 

A. If the status is the same, pay the TOTAL FEE(S) DUE shown 
above, 

B. If the status above is to be removed, check box 5b on Part B - 
Fce(s) Transmittal and pay the PUBLICATION FEE (if required) 
and twice the amount of the ISSUE FEE shown above, or 



If the SMALL ENTITY is shown as NO: 

A. Pay TOTAL FEE(S) DUE shown above, or 

B. If applicant claimed SMALL ENTITY status before, or is now 
claiming SMALL ENTITY status, check box 5a on Part B - Fee(s) 
Transmittal and pay the PUBLICATION FEE (if required) and 1/2 
the ISSUE FEE shown above. 

/ . 

II. PART B - FEE(S) TRANSMITTAL should be completed and returned to the United States Patent and Trademark Office (USPTO) with 
your ISSUE FEE and PUBLICATION FEE (if required). Even if the fee(s) have already been paid, Part B - Fee(s) Transmittal should be 
completed and returned. If you are charging the fee(s) to your deposit account, section "4b" of Part B - Fee(s) Transmittal should be 
completed and an extra copy of the form should be submitted. 

III. All communications regarding this application must give the application number. Please direct all communications prior to issuance to 
Mail Stop ISSUE FEE unless advised to the contrary. 

IMPORTANT REMINDER: Utility patents issuing on applications filed on or after Dec. 12, 1980 may require payment of 
maintenance fees. It is patentee's responsibility to ensure timely payment of maintenance fees when due. 
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PART B - FEE(S) TRANSMITTAL 
Complete and send this form, together with applicable fee(s), to: Mail 



or Fax 



Mail Stop ISSUE FEE 
Commissioner for Patents 
P.O. Box 1450 

Alexandria, Virginia 22313-1450 
(571) 273-2885 



INSTRUCTIONS: This form should be used for transmitting the ISSUE FEE and PUBLICATION FEE (if required). Blocks I through 5 should be completed where 
appropriate. All further correspondence including the Patent, advance orders and notification of maintenance fees will be mailed to the current correspondence address as 
indicated unless corrected below or directed otherwise in Block 1, by (a) specifying a new correspondence address; and/or (b) indicating a separate "FEE ADDRESS" for 
maintenance fee notifications. 



CURRENT CORRESPONDENCE ADDRESS (Note: Use Block 1 for any change of address) 
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FISH & RICHARDSON, PC 
12390 EL CAMINO REAL 
SAN DIEGO, CA 92130-2081 



Note: A certificate of mailing can only be used for domestic mailings of the 
Fee(s) Transmittal. This certificate cannot be used for any other accompanying 
papers. Each additional paper, such as an assignment or formal drawing, must 
have its own certificate of mailing or transmission. 

Certificate of Mailing or Transmission 
I hereby certify that this Feefs) Transmittal is being deposited with the United 
States Postal Service with sufficient postage for first class mail in an envelope 
addressed to the Mail Stop ISSUE FEE address above, or being facsimile 
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ART UNIT 



CLASS-SUBCLASS 



FREJD, RUSSELL WARREN 



2128 



703-015000 



1. Change of correspondence address or indication of "Fee Address" (37 
CFR 1.J63). v 

Q Change of correspondence address (or Change of Correspondence 
Address form PTO/SB/ 1 22) attached. 

□ "Fee Address" indication (or "Fee Address" Indication form 
PTO/SB/47; Rev 03-02 or more recent) attached. Use of a Customer 
Number is required. 



2. For printing on the patent front page, list 

(1) the names of up to 3 registered patent attorneys 
or agents OR, alternatively, 

(2) the name of a single firm (having as a member a 
registered attorney or agent) and the names of up to 
2 registered patent attorneys or agents. If no name is 
listed, no name will be printed. 



3. ASSIGNEE NAME AND RESIDENCE DATA TO BE PRINTED ON THE PATENT (print or type) : ~ 

rinrH^Hnn PJJS JSSfX iS SSiR 6 ? i* i denti f^ below .'. "° assignee data will appear on the patent. If an assignee is identified below, the document has been filed for 
recordation as set forth in 37 CFR 3.11. Completion of this form is NOT a substitute for filing an assignment. 



(A) NAME OF ASSIGNEE 



(B) RESIDENCE: (CITY and STATE OR COUNTRY) 



Please check the appropriate assignee category or categories (will not be printed on the patent) : □ Individual □ Corporation or other private group entity □Government 

4a. The following fee(s) are enclosed: 4b. Payment of Fee(s): ~ " " 

□ Issue Fee □ a check in the amount of the fee(s) is enclosed. 

□ Publication Fee (No small entity discount permitted) □ Payment by credit card. Form PTO-2038 is attached. 



Q Advance Order - # of Copies 



□ The Director is hereby authorized by charge the required fee(s), or credit any overpayment to 
Deposit Account Number (enclose an extra copy of this form). 



5. Change in Entity Status (from status indicated above) 

□ a. Applicant claims SMALL ENTITY status. See 37 CFR 1.27. 



□ b. Applicant is no longer claiming SMALL ENTITY status. See 37 CFR 1.27(g)(2). 



^S^^ul^vl^ 0 ^^^^ * 6 if U 'if eC ^ d Publica i i 2 n Fee < if an V) 2 r to »-apply any previously paid issue fee to the application identified above. 



Authorized Signature _ 



Date 



Typed or printed name _ 



Registration No. 




Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it displays a valid OMB control number. 
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Determination of Patent Term Adjustment under 35 U.S.C. 154 (b) 

(application filed on or after May 29, 2000) 

The Patent Term Adjustment to date is 682 day(s). If the issue fee is paid on the date that is three months after the 
mailing date of this notice and the patent issues on the Tuesday before the date that is 28 weeks (six and a half 
months) after the mailing date of this notice, the Patent Term Adjustment will be 682 day(s). 

If a Continued Prosecution Application (CPA) was filed in the above-identified application, the filing date that 
determines Patent Term Adjustment is the filing date of the most recent CPA. 

Applicant will be able to obtain more detailed information by accessing the Patent Application Information Retrieval 
(PAIR) WEB site (http://pair.uspto.gov). 

Any questions regarding the Patent Term Extension or Adjustment determination should be directed to the Office of 
Patent Legal Administration at (571) 272-7702. Questions relating to issue and publication fee payments should be 
directed to the Customer Service Center of the Office of Patent Publication at (703) 305-8283. 
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Application No. 


Applicant(s) 


Notice of Allowability 


09/941,952 


WHEELER ET AL 


Examiner 


Art Unit 






Russell Frejd 
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- The MAILING DATE of this communication appears on the cover sheet with the correspondence address- 

All claims being allowable, PROSECUTION ON THE MERITS IS (OR REMAINS) CLOSED in this application. If not included 
herewith (or previously mailed), a Notice of Allowance (PTOL-85) or other appropriate communication will be mailed in due course. THIS 
NOTICE OF ALLOWABILITY IS NOT A GRANT OF PATENT RIGHTS. This application is subject to withdrawal from issue at the initiative 
of the Office or upon petition by the applicant. See 37 CFR 1.313 and MPEP 1308. 

1 . ^3 This communication is responsive to applicant's amendment received 23-Mav-2005 . 

2. [3 The allowed claim(s) is/are 1-5, 7-15,1 7-25 and 27-30 . 

3. (3 The drawings filed on 28 January 2003 are accepted by the Examiner. 

4. □ Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 19(a)-(d) or (f). 

a) □ All b) □ Some* c) □ None of the: 

1. D Certified copies of the priority documents have been received. 

2. □ Certified copies of the priority documents have been received in Application No. . 

3. □ Copies of the certified copies of the priority documents have been received in this national stage application from the 

International Bureau (PCT Rule 17.2(a)). 
* Certified copies not received: . 

Applicant has THREE MONTHS FROM THE "MAILING DATE" of this communication to file a reply complying with the requirements 
noted below. Failure to timely comply will result in ABANDONMENT of this application. 
THIS THREE-MONTH PERIOD IS NOT EXTENDABLE 

5. □ A SUBSTITUTE OATH OR DECLARATION must be submitted. Note the attached EXAMINER'S AMENDMENT or NOTICE OF 

INFORMAL PATENT APPLICATION (PTO-152) which gives reason(s) why the oath or declaration is deficient. 

6. □ CORRECTED DRAWINGS ( as "replacement sheets") must be submitted. 

(a) □ including changes required by the Notice of Draftsperson's Patent Drawing Review ( PTO-948) attached 

1) □ hereto or 2) □ to Paper No./Mail Date . 

(b) □ including changes required by the attached Examiner's Amendment / Comment or in the Office action of 

Paper No./Mail Date . 

Identifying Indicia such as the application number (see 37 CFR 1.84(c)) should be written on the drawings in the front (not the back) of 
each sheet. Replacement sheet(s) should be labeled as such In the header according to 37 CFR 1.121(d). 

7. □ DEPOSIT OF and/or INFORMATION about the deposit of BIOLOGICAL MATERIAL must be submitted. Note the 

attached Examiner's comment regarding REQUIREMENT FOR THE DEPOSIT OF BIOLOGICAL MATERIAL. 



Attachment(s) 

1 . M Notice of References Cited (PTO-892) 5. □ Notice of Informal Patent Application (PTO-152) 

2. □ Notice of Draftperson's Patent Drawing Review (PTO-948) 6. □ Interview Summary (PTO-413), 

Paper No./Mail Date . 

3. |3 Information Disclosure Statements (PTO-1449 or PTO/SB/08), 7. □ Examiner's Amendment/Comment 

Paper No./Mail Date 7.20.05 | 

4. □ Examiner's Comment Regarding Requirement for Deposit 8. [3 Examiner's Statement of Reasons for Allowance 

of Biological Material § 9. □ Other . 
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Notice of Allowability 



Part of Paper No./Mail Date 08082005 



Serial Number: 09/941,952 Page 1 

In re Application of: Wheeler et al. 

Allowance of Application # 09/941,952 

1. The following communication is in response to applicant's amendment received 23-May- 
2005, and applicant's IDS received on 20-July-2005. 

Reasons for Allowance 

2. The following is an Examiner's Statement of Reasons for the indication of allowable 
subject matter. The instant application is directed to a non-obvious improvement over the 
invention described in U.S. Patent No. 5,220,512, the improvement comprising an apparatus and 
method for simulating a logic design comprised of combinatorial logic and state logic, wherein 
clock domains are identified for combinatorial logic and state logic using separate graphic 
elements, computer code is generated based on the clock domains that simulate operation of 
portions of the logic design, and the computer code is associated with the graphic elements. This 
patentable distinction is included in each of the independent claims, nos. 1,11, and 21 . The art 
of record, either individually or in combination, fails to teach, suggest, or render obvious the 
useful, concrete and tangible simulation of a logic design comprised of combinatorial logic and 
state logic strength> having the corresponding structure which is disclosed in the specification 
and equivalents thereof at least at page 2, line 17 through page 17, line 4, and Figures 1-7. In 
view of the foregoing, the claims of the present application are found to be patentable over the 
prior art. 

Response Guidelines 

3. Any comments considered necessary by applicant MUST be submitted no later than the 
payment of the Issue Fee and, to avoid processing delays, should preferably accompany the 



Serial Number: 09/941,952 

In re Application of: Wheeler et al. 



Page 2 



Issue Fee. Such submissions should clearly be labeled "Comments on Statement of Reasons for 
Allowance". 



3.1 Any response to the Examiner in regard to this allowance should be 

directed to: Russell Frejd, telephone number (571) 272-3779, Monday-Friday 
from 0530 to 1400 ET, or the examiner's supervisor, Jean Homere, 
telephone number (571) 272-3780. Inquires of a general nature or 
relating to the status of this application should be directed to the TC2100 
Group Receptionist (571) 272-2100. 

mailed to: Commissioner of Patents and Trademarks 
P.O. Box 1450, Alexandria, VA 22313-1450 

or faxed to: (571)273-8300 

Hand-delivered responses should be brought to the Customer Service Window, Randolph Building, 401 
Dulany Street, Alexandria, VA, 22314. 
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Abstract 

We present a technique using diverse duplication to 
implement concurrent error detection (CED) in sequential 
logic circuits. We examine three different approaches for 
this purpose: (1) Identical state encoding of the two 
sequential logic implementations, duplication of flip-flops, 
diverse implementation of the combinational logic part 
(output logic and next-state logic) and comparators on 
flip-flop outputs and primary outputs; (2) Diverse state 
encoding of the two implementations, duplication of flip- 
flops, diverse combinational logic implementation and 
comparators on primary outputs only; and (3) Identical 
state encoding, parity prediction for the flip-flops, diverse 
combinational logic implementation, comparators on 
primary outputs and parity checkers on flip-flop outputs. 
Our results for the simulated sequential benchmark 
circuits demonstrate that the third approach is most 
efficient in protecting sequential logic circuits against 
multiple and common-mode failures. The computational 
complexity of the data integrity analysis of the third 
approach is of the same order as that of the first approach 
and is at least an order of magnitude less than that of the 
second approach. 

1. Introduction 

Concurrent Error Detection (CED) techniques are 
widely used for designing systems with high data integrity. 
By data integrity, we mean that the system either produces 
correct outputs or generates an error signal when incorrect 
outputs are produced. A duplex system in the form of a 
self-checking pair is a classical example of a CED scheme 
which has been used for guaranteeing data integrity in 
many applications like the IBM G5 and G6 processors 
[Spainhower 99]. Figure 1.1 shows the basic principle of 
operation of a duplex system. As long as only one module 
fails, a duplex s ystem provides guaranteed data integrity. 




Comparator J 
"[Error 



Figure 1 .1 . A duplex redundant system 



It is generally assumed that module failures are 
independent events; hence, in a duplex system, the 
probability that both modules fail is very low for realistic 
failure rates. However, this assumption is not always true. 
In a duplex system, common-mode failures (CMFs) result 
from failures that affect both modules at the same lime, 
generally due to a common cause [Lala 94]. These include 
operational failures due to external (such as EMI, power- 
supply disturbances, radiation) or internal causes and 
design mistakes. CMFs are surveyed in [Mitra 00a]. 

Design diversity was proposed and used in the past to 
protect redundant systems against common-mode failures 
[Avizienis 84, Briere 93, Riter 95]. In [Avizienis 84], 
design diversity was defined as the independent generation 
of two or more software or hardware elements (e.g., 
program modules, VLSI circuit masks, etc.) to satisfy a 
given requirement The basic idea is that, with different 
implementations, common failure modes will cause 
different error effects. 

The conventional notion of diversity is qualitative and 
does not provide any quantitative insight into design of 
diverse duplex systems. In [Mitra 99a], a metric was 
developed to quantify design diversity and analyze the 
reliability, availability and data integrity of duplex systems 
using this metric. In [Mitra 00b], this metric was used as a 
cost function to synthesize diverse implementations of 
combinational logic functions. However, the efforts on 
characterization of diverse duplex systems were focused on 
combinational logic circuits. In this paper, we extend our 
ideas to sequential logic circuits. 

This work was done as part of the ROAR (Reliability 
Obtained by Adaptive Reconfiguration) project [Saxena 
00]. In the project, the system under consideration is 
reconfigurable and contains user-programmable logic 
elements (e.g., FPGAs). For such systems, faults can be 
detected during system operation, the faulty part can be 
located, and the system can be reconfigured to operate 
without using the defective part. The Field Replaceable 
Unit (FRU) is a programmable logic block or a routing 
resource, instead of a chip or a board used in any 
conventional fault-tolerant system. Hence, it is reasonable 
to design combinational or sequential logic with concurrent 
error detection such as duplication. 

In Sec. 2, we describe three approaches to designing 
sequential logic circuits with CED based on diverse 
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duplication and present simulation results comparing these 
three schemes. Section 3 describes a technique to analyze 
the data integrity of sequential logic circuits with CED. 
We conclude in Sec. 4. 
2. Diverse Duplication for Sequential Logic Circuits 
We consider the Finite State Machine (FSM) model of 
sequential circuits [McCIuskey 861 as shown in Fig. 2.1. 
In addition, we assume that faults do not affect the clock 
signal (not shown in Fig. 2.1) in the FSM implementations. 
While our technique can be extended for faults on clock 
signal lines, this assumption is reasonable when fault- 
tolerant clocks [Siewiorek 92] are used. 



Flip-Flops 



Primary 
Inputs 



Next-State 
& 

Output Logic 



Outputs 

Figure 2.1 . FSM model of a sequential circuit 
Various techniques have been proposed in the past to 
implement concurrent error detection in sequential circuits. 
These include techniques based on parity prediction, 
Berger and Bose-Lin codes. [Zeng 99] presents a 
comprehensive description of these previously reported 
CED techniques for sequential logic circuits. Results 

presented in [Mitra 00c] demonstrate that, for general 
combinational logic circuits, CED techniques based on 
diverse duplication provide better protection against 
multiple failures and CMFs compared to simple 
duplication and parity prediction; moreover, the area 
overhead of diverse duplication is comparable to (or 
marginally more than) that of parity prediction. Hence, in 
this paper we study CED techniques based on diverse 
duplication for sequential logic circuits. 

2.1. Identical State Encoding and Diverse Logic 
(ISEDL) 

In Fig. 2.2 both implementations have identical 
encoding of the FSM internal states; however, we have 
diverse implementations of the next-state and the output 
logic. The primary outputs and the state-bits (flip-flop 
outputs) of the two implementations are compared and an 
error is indicated when a mismatch occurs. 

I 



Primary 
Inputs 



Rp-Rops 

I 



Next-State 
& 

Output Logic 



Primary 
Inputs 



Flip-Flops] 



Next-State 
& 

Output Logic 
(Nz) 



Match 



Error 



Match 



Error 



Outputs Outputs 

Figure 2.2. Identical state encoding and diverse logic 
(ISEDL) 



For synthesizing diverse implementations of the next- 
state and the output logic the technique in [Mitra 00b] can 
be used. This CED technique suffers from the problem 
that there is no diversity in the state encoding (i.e., the flip- 
flop contents). In the worst-case, for a fault / affecting a 
flip-flop in the first implementation, a fault g affecting the 
corresponding flip-flop in the second implementation can 
be identified, such that the fault pair (f t g) can never be 
detected by the comparator; this situation is not desirable. 

2.2. Diverse State Encoding and Diverse Logic 
(DSEDL) 

Diversity can be created by encoding the internal 
states of the given FSM in "different" ways in the two 
implementations. This provides another degree of freedom 
in the synthesis of FSMs with CED based on diverse 
duplication and can possibly help in providing enhanced 
protection against CMFs compared to the scheme in Fig. 
2.2. This scheme is shown in Figure 2.3. Since the 
encoding of the internal states of the FSM are not identical 
in the two implementations, simple self-checking 
comparator designs cannot be used to check the flip-flop 
outputs - the comparator design can be very complex. 
This can degrade the capability of this technique to detect 
multiple failures and CMFs. 

_c 



Primary 
Inputs 



Outputs Outputs 

Figure 2.3. Diverse state encoding and logic implementation 
(DSEDL) 

The encoding of the internal states of the second 
implementation can be looked upon as a transformation of 
the encoding of the internal states of the first 
implementation. Formally, if E\(s) represents the 
encoding of state s in the first implementation, and E 2(s) 
represents the encoding of state s in the second 
implementation, then £2(5) = T(E\(s)). If T is a "simple" 
transformation (e.g., linear transformation consisting of xor 
gates only), then we can design inexpensive checkers (e.g., 
parity trees) to check the flip-flop outputs. 

2.3. Diverse Duplication for Output Logic; Parity 
Prediction for Next-State Logic 
(PPNSLDOL & PPDL) 

The CED technique ISEDL (Sec. 2.1) has the 
following advantages over the technique DSEDL (Sec. 
2.2): (1) The flip-flop outputs in the two implementations 
can be compared; hence, if a fault-pair produces non- 
identical next state outputs, it will be detected; (2) As will 
be illustrated in Sec. 3, the computational complexity of 
the analysis of the ISEDL technique is much less than that 
of the DSEDL technique. However, the ISEDL technique 
suffers from the problem of having no diversity in the flip- 
flop contents. 
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The CED scheme of this section combines the 
advantages of the ISEDL and DSEDL techniques. We use 
diverse duplication for the output logic and parity 
prediction for the next-state logic of the FSM 
implementation. Figures 2.4a and 2.4b show two 
implementations of this CED scheme. 

In Fig. 2.4a we use simple parity prediction for the 
next-state logic (with the appropriate constraints on logic 
sharing). The technique in [Mitra 00b] can be used for 
synthesizing the diverse implementations of the output 
logic; the technique in [Touba 97] can be used for 
synthesizing the next-state logic with parity prediction. 
This technique is called PPNSDOL (Parity Prediction for 
Next Slate Logic and Diverse Output Logic). 

In Fig. 2.4b, we use diverse duplication for the next 
state logic also and check the outputs of the two 
implementations using a comparator. Then, we add one or 
more parity trees at the outputs of one of these 
implementations to generate parity bits. This technique is 
called PPDL (Parity Prediction and Diverse Logic). The 
PPDL technique provides more protection from multiple 
failures and CMFs affecting the next-state logic compared 
to the PPNS DOL technique (Fig. 2.4a) [Mitra 00a . 
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Figure 2.4. Diverse Duplication of sequential logic circuits 
with parity prediction on Uip-ftops. (a) PPNSDOL (b) PPDL. 
2.4. Simulation Results 

In Table 2.1 we report the area overhead of the CED 
schemes for some MCNC FSM benchmark circuits and the 
IEEE 1 149.1 Boundary-Scan TAP controller [Parker 92] 
(named TAP). 

The circuits were synthesized using the Sis tool 
[Sentovich 92]. For synthesizing diverse implementations 
of the FSM next-state and output logic we synthesized truth 



tables with true and complemented outputs using the Sis 
tool. We used espresso for two- level minimization, 
rugged.script for multi -level optimization and the LSI 
Logic GlOp library [LSI 96] for technology mapping. For 
synthesizing FSMs with diverse state encoding, we used 
two different state encoding algorithms nova [Villa 90] and 
jedi [Lin 89]. For synthesizing parity prediction for next- 
state logic we used the technique described in [Touba 97]. 
For most cases, the PPDL technique (Fig. 2.4b) generates 
circuits with less area overhead compared to PPNSDOL 
(Fig. 2.4a); hence, area results for the PPNSDOL technique 
arc not shown in Table 2. 1 . 



Circuit 
Name 


Comb. Logic area, # Flip-Flops 


ISEDL 


DSEDL 


PPDL 


TAP 


371,8 


406,8 


390,5 


bbsse 


801,8 


780.8 


841,5 


cse 


1159.8 


1127,8 


1195,5 


beecount 


197,6 


208,6 


222,4 


dk14 


506,6 


559.6 


526,4 


ex1 


1654, 10 


1639. 10 


1704, 6 



Next, we present simulation results on the 
vulnerability of CED techniques to multiple failures and 
CMFs. In dependable systems, it is realistic to assume that 
a corrective action is initiated after the system generates an 
error signal. Thus, for any system with CED, data integrity 
is guaranteed as long as the system does not produce an 
undetected corrupt output before indicating an error. 

For each fault pair (ft, fj) affecting the FSM, for each 
primary input sequence, the FSM produces outputs that 
belong to the following categories: (1) correct outputs; (2) 
produces an error signal before producing an undetected 
erroneous output; (3) produces an undetected erroneous 
output before producing an error signal. Let yjj be the 
fraction of input sequences for which the system produces 
only correct outputs', let zjj be the fraction of input 
sequences for which the system produces an error signal 
before producing an undetected erroneous output. We 

define the term wfj * — for the fault pair (fi*fj) as 

1 " %J 

the detected fraction or incorrect output detect ability, 
which is the fraction of primary input sequences producing 
erroneous outputs for which the system data integrity is 
maintained. If the value of this term is I the system either 
produces correct outputs or indicates erroneous situations 
when incorrect outputs are produced. If the value is 0 the 
system never produces any error signal when incorrect 
outputs are produced. Note that, if a CED-based system 
produces correct outputs for all input combinations even in 
the presence of a fault, then the fault is redundant. 
Similarly, for each fault pair (ft fj), we define the 
probability of undetected error as x(j - 1 - y j j - zij- 

We used the following simulation procedure. For each 
single-stuck-at fault//, we simulated exhaustively all fault 
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pairs lo identify another single-stuck-at fault fj in the same 
circuit that had the minimum value of wq or x/j. Hence, 
the fault pair (#, fj) can be regarded as a worst-case fault 
pair. Finally, we averaged the wq*s (orxy's) over all the 
worst-case fault pairs. The primary input sequences during 
simulation were applied in the following way. For each 
state 5 of the implemented FSM, we initialized the FSM to 
state s and applied 500-1000 pseudo-random primary input 
sequences generated by an LFSR; each primary input 
sequence was of length of 100-200. The results for fault 
pairs in the combinationaJ logic parts are shown in Table 
2.2. The benchmark circuits are small enough so that the 
simulation procedure can be completed. For Table 2.2, the 
results for the CED techniques PPNSDOL and PPDL (Fig. 
2.4a and 2.4b) are not shown separately because the results 
for PPDL (Fig. 2.4b) are the same as that for ISEDL (Fig. 
2.2). Moreover, as discussed in [Mitra 00c], the results for 
PPNSDOL are worse than that for PPDL. 



Circuit 
Name 


Incorr. O/p Detectabilitv 


Prob. Undet. Error 


ISEDL, PPDL 


DSEDL 


ISEDL, PPDL 


DSEDL 


TAP 


0.4 


0.06 


0.6 


0.94 


beecount 


0.33 


0.37 


0.38 


0.33 


cse 


0.46 


0.46 


0.11 


0.12 


dk14 


0.54 


0.54 


0.34 


0.38 



The results of Table 2.2 indicate that, for the simulated 
designs, the protection provided by the ISEDL or PPDL 
techniques against multiple failures or CMFs in the 
combinational logic is better than or comparable to that of 
the DSEDL technique (diverse state encoding). The 
DSEDL technique has very low incorrect output 
detectability for the TAP controller FSM. This is mainly 
due to the fact that, for the DSEDL technique, the 
combinational logic can produce non-identical errors on 
the flip-flop inpuLs; however, since there is no "easy way" 
to check the flip-flop contents, these errors cannot be 
detected and eventually the faults eventually produce 
identical errors. Table 2.3 shows simulation results for 
faults affecting only the flip-flop outputs. 



Circuit 


Incorr. O/p Detectabilitv 


Prob. Undet. Error 


Name 


ISE 


DSE 


PPNSDOL, 


DSE 


PPNSDOL, 




DL 


DL 


PPDL 


DL 


PPDL 


TAP 


0 


0.74 


0.54 


0.26 


0.46 


cse 


0 


0.30 


0.43 


0.14 


0.34 


dk14 


0 


0.40 


0.49 


0.6 


0.51 


dk16 


0 


0.48 


0.54 


0.52 


0.46 


ext 


0 


0.58 


0.48 


0.42 


0.52 



effectiveness of the PPDL technique of Fig. 2.4b (diverse 
combinational logic implementation, parity prediction for 
flip-flops and generation of parity bit through an XOR-tree 
from a next-state logic implementation) for implementing 
CED in the simulated designs. 

It may be noted here that, if transient faults create bit- 
flips (rather than bit-siucks) in the flip-flops of a sequential 
circuit, then the CED technique based on diverse state 



encoding technique based on linear transformations, which 
is an extension of the idea of parity prediction as described 
at the end of Sec. 2.2, is expected to outperform the other 
techniques (ISEDL, PPNSDOL or PPDL) so far as data 
integrity is concerned. 

In the next section we describe a formal technique for 
analyzing each of the CED schemes; the discussion also 
shows that the computational complexity of analyzing the 
DSEDL technique is at least an order of magnitude higher 
than that of the ISEDL, PPNSDOL or PPDL techniques. 
3. Analysis of CED schemes 

Suppose that we are given two implementations N\ 
and N 2 of an FSM M . The FSM M can be characterized by 
a state table [McCluskey 86] which can be formally 
represented by the following set {/, 0> S, T, L). Here, / is 
the set of primary input combinations, O is the set of 
primary output combinations and S is the set of internal 
states, T is the transition logic which can be looked upon 
as a mapping from Sxl to 5. L is the output logic which 
can be represented as a mapping from Sxl. An input 
distribution of an FSM is given by the conditional 
probability distribution P(tU) for all ie/ and seS. P(i\s) is 
the conditional probability that a primary input 
combination iel is applied to the FSM when it is in state s. 
For the current paper, we assume that all primary input 
combinations are equally likely for all states. However, for 
specific systems, the input distribution can be 
approximated using trace simulations. 

1/0 (0.5) 




1/1 (0.5) 

Figure 3.1. Example FSM 
For example, consider the example FSM in Fig. 3. 1 . 
For this FSM, S = {X, Y}; / = {0, 1}, O = (0, 1}. The 
next-state logic T is given by: 7PC, 0) = X, T(X, 1) = Y, 
T(Y, 0) = Y and T(Y, 1) = X. The output logic L is: L(X, 

0) = 0, L(X, 1) = 0, L(Y, 0) = 0 and L(Y, 1) = 1. For any 
state, the probability that the primary input has value 0 (or 

1) is 0.5. Figures 3.2a and 3.2b show two implementations 
N\ and N 2 of the FSM in Fig. 3.1. If we use these two 
implementations for CED we have a DSEDL CED 
technique. 

Let us suppose that faults / and g affect 
implementations hf[ and N 2 , respectively. We can 
construct faulty FSMs Mj- {I, O, S f , T fy L f } and M g = {I, 
O, S g , T gf L g ) in the presence off and g, respectively. The 
two faulty FSMs are shown in Fig. 3.2c and 3.2d, 
respectively. Next, we can construct the product machine 
K = MxMjxMg, as follows. The set of states of K is given 
by K$ = SxSjxSg. i.e., each state of K can be represented as 
a tuple {a, b y c\ where aeS y be Sf and ceS g . The transition 
logic Kj-of K is given by the following mapping: Kj{{a y b. 



181 



c), i] = [T\a 7 i), Tfib t i), T g (c t i)] where (a. Z>. c) 6 5x5/cS fi 
and / e /. The output logic Ki of the product FSM K is a 
mapping from K$>d to 0x0x0 and is defined by K/,[(a, b> 
c), i) = 0. L/fc 0. ')] where (a, fr, c) e SxSjxS g 
and / g /. The input distribution of the product FSM K is 
defined as P[ft(a, b, c)] = P(ila) in FSM M y where (a, 4», c) 
e SxSjxSg and r e A Figure 3.3 shows the product FSM K 
for the example in Fig. 3.2. 
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Figure 3.2. FSM implementations with faults, (a) 
Implementation with fault f. (b) Implementation with fault g. 
(c)-(d) State diagram of implementation with fault f and g. 
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Figure 3.3. Product FSM: Good, Bad, Detecting transitions 
The state transitions of the product FSM K can be 
classified into following categories: (1) good (G) 
transition, (2) detecting (D) transition, and (3) bad (B) 
transition. 

A transition from state (a, £>, c) under input 
combination i is a good transition if the output produced 
by the FSM M is the same as the outputs produced by Mj 
and M g . Formally, a transition Kj{(a t b, c), i] = [T(a, 0. 
Tfib, /), T g (c, 0] is a good transition if and only if L{a, i) = 
Lfib, i) = L g (c* /). 

A transition from slate (a, b, c) under input 
combination i is a detecting transition if the outputs 
produced by Mj and M g are different. Formally, a 
transition Kj{(a t b, c), i] = [7(a, /), Tfib, i), T g (c t i)] is a 
detecting transition if and only if Lfib, i) * L g (c y i). 

A transition from state {a, b, c) under input 
combination i is a bad transition if Mf and M g produce 
identical erroneous outputs (different from the output 
produced by Af). Formally, a transition Kf[(a, b t c), i] = 



[T\a f 0» Tj(b t /), Tg(c, /)] is a bad transition if and only if 
L(a r i) * Lfa i) and L/fc, 0 = L^c, i). 

Figure 3.3 shows the labels of the transitions of the 
product FSM K. For a CED technique based on diverse 
logic implementation but identical state encoding, the 
outputs of the corresponding flip-flops in the two 
implementations can be compared. This means that any 
state {a> b, c) in the product machine K detects the presence 
of a fault if b * c. All such states can be merged into a 
single state Detected. This reduction is not possible for a 
CED scheme with diverse slate encoding unless there is an 
"easy" way to check that both the implementations are in 
the same state. All detecting transitions in the product 
machine K can be redirected to the Detected state; all 
edges starting from the states that are merged into the 
Detected state can be deleted. There is no outgoing edge 
from the Detected state. All bad transitions in the product 
FSM K can be redirected to a new state Error. There is no 
outgoing edge from the Error state. After these reductions, 
all unreachable states and edges starting from them in the 
final FSM can be deleted. Figure 3.4 illustrates these 
reduction techniques for the product FSM in Fig. 3.3 for 
the case when the internal states of the two FSMs are 
checked. The system never enters an Error state and the 
data integrity in the presence of the fault pair is 1 . 




Figure 3.4. Reduced FSM with comparator states 
The data integrity of the CED system at time t in the 
presence of faults can be defined in the following way. For 
each state s of the original fault- free FSM, we identify the 
state S = (a, b, c) in the product FSM such that a = j, and b 
and c are the corresponding the states in the two 
implementations with faults; next, we calculate the 
probability E(S, r) of being in the Error state in the product 
FSM at time t starting from state 5. This can be calculated 
using straightforward Markov analysis techniques and tools 
like SHARPE (http://www.ee .duke.edu/-kst). The data 
integrity of the CED system in the presence of a given fault 

pair is equal to ^P(s)[\- E(S,t)] . Here, P(s) is the 
s 

stationary probability of state s in the original fault-free 
FSM. For very low failure rates, it is realistic to assume 
that the original FSM reaches a stationary probability state 
before a fault affects the FSM. Analysis of CED schemes 
based on diverse duplication of output logic and parity 
prediction of next-state logic is similar to the analysis 
technique described above and is not repeated. 
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3.1. Computational Complexity of the Analysis 
Theoretically, the above analysis technique is 
computationally intensive because of the following 
problems. The analysis technique may run into memory 
problems due to possible state space explosion during the 
computation of the product FSM. For example, if the 
original FSM has 64 states, it is theoretically possible that 
the product FSM will have 64 3 = 262,144 states if we use 
the DSEDL technique (without comparators comparing the 
flip-flop outputs). Moreover, if the original FSM has a 
large number of primary inputs, then the construction of 
the product FSM will be very time consuming if we have to 
compute the state transition of the product FSM from each 
state for each primary input combination. In Table 3.1, we 
show the characteristics of the 1 149.1 Boundary-Scan TAP 
controller and the MCNC FSM benchmark circuits and the 
average and the maximum number of states in the product 
FSM over all single stuck-at fault pairs. 

Most of the FSM benchmarks in the MCNC 
benchmark suite have the number of states not more than 
32. The TAP FSM has 16 states and a single primary 
input. A similar observation can be made about the 
internal benchmark FSM specifications of CAD 
companies. This is perhaps because FSMs used in real 
designs are designed as interacting state machines. 
Table 3.1 . Characteristics of designs for which exact 



analysis of ISEDL, PPNSDOL and PPDL is feasible 



Circuit 


# PI, # PO, 


Avg. # states in 


Max. # states in 


Name 


# States 


product FSM 


product FSM 


TAP 


1. 7.16 


22 


90 


cse 


7. 7, 16 


23 


140 


dk16 


2, 3, 27 


45 


342 


ex1 


9, 19, 18 


25 


160 


sand 


11.9, 32 


60 


600 



However, there are some FSM specifications with the 
number of states approximately 97 or 135; moreover, for 
FSMs with a large number of primary inputs, an exact 
analysis for each input combination can be very time 
consuming (FSMs s420, s5I0, s820 and a t/ with 19, 19. 18 
and 27 primary inputs, respectively). For these FSM 
benchmarks approximate techniques must be devised. 
4. Conclusions 

We studied the problem of implementing concurrent 
error detection (CED) based on diverse duplication in 
sequential logic circuits. We examined three different 
techniques for this purpose. Our simulation results 
demonstrate that the CED technique based on diverse 
duplication of combinational logic and parity prediction of 
flip-flop contents is most efficient in protecting sequential 
logic circuits against multiple and common-mode failures. 
We also described an exact technique to analyze the data 
integrity of sequential logic circuits with CED. Our results 
on MCNC benchmark circuits show that the exact analysis 
technique is feasible for many (80 %) benchmark circuits 
although theoretically it can suffer from state space 
explosion problems. Future research must focus on 
extending the idea of parity prediction for next-state logic 



to generate "simple" transformations for diverse state 
encoding and developing efficient analysts techniques that 
do not suffer from state explosion problems and can handle 
FSM specifications with a large number of primary inputs. 
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Abstract — The authors have previously proposed a new su- 
perconducting voltage-state logic family called complementary 
output switching logic (COSL). This logic family has been de- 
signed using a Monte Carlo optimization process such that 
circuits have a high theoretical yield at 5-10 Gb/s clock speeds 
in spite of existing Joseph son process variations. In the present 
work the Monte Carlo optimization process is described and 
theoretical yields are calculated for the COSL 2- and 3-bit 
encoder circuits. The circuit simulations use 5-10-GHz sinusoidal 
clocks and measured global and local process variations. The 2-bit 
encoder results are compared to modified variable threshold logic 
(MVTL) circuits and demonstrate that COSL circuits should have 
a significantly higher theoretical yield than MVTL at 10 Gb/s. 
Design rules for optimal COSL circuit layouts are also given, 
and experimental data are presented for 2-bit encoder circuits 
operating at m u It i gigahertz clock frequencies. HSP1CE is used for 
all Monte Carlo simulations and the Josephson junction model 
is given in the Appendix. 

Index Terms — Monte Carlo methods, superconducting device 
testing, superconducting integrated circuits, yield optimization. 

I. Introduction 

PRACTICAL applications of superconducting logic will 
require digital circuits that can operate at 10-Gb/s clock 
speeds and beyond. Unfortunately, Josephson circuits are espe- 
cially sensitive to process variations and, in the case of voltage- 
state logic, increasing clock speeds beyond 2 Gb/s tends to 
seriously degrade circuit margins [1]. We have proposed a 
new type of voltage-state logic called complementary output 
switching logic (COSL) [2], [3]. These circuits were optimized 
for 5- 10-Gb/s operation using a Monte Carlo method so that 
they are relatively robust to process variations. In the present 
work the Monte Carlo optimization method is described in 
detail and is applied to 2- and 3-bit encoder circuits for a flash 
analog-to-digital converter (ADC). 

A number of factors combine to make reaching the goal 
of 10-Gb/s superconducting circuits challenging. At a funda- 
mental level the primary roadblocks have been flux trapping 
and process variations. Trapped flux in or near Josephson 
junctions significantly depresses junction critical currents and 
can, by reducing the overall circuit margins, prevent large 
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circuits from operating. We have previously studied the flux 
trapping problem in detail and demonstrated that with good 
shielding and moats one can practically eliminate flux trapping 
in the Josephson circuits [4]. Process variations can also 
significantly reduce circuit margins and can prevent large 
digital circuits from operating correctly [5], [6]. We have, 
therefore, developed a circuit optimization method which 
explicitly includes process variations. 

We combine experimental measurements on process spreads 
with Monte Carlo simulations. The COSL gates are optimized 
using a Monte Carlo method, and we iterate between basic 
gates and complex circuits to optimize the yield of large 
circuits. Simulation examples are given for 2- and 3-bit 
encoder circuits, and experimental test results for 2-bit encoder 
circuits operating at 1-4 Gb/s are presented. Design rules 
for optimal COSL circuit design are also discussed. The 
simulations demonstrate that COSL circuits should have a 
significantly higher theoretical yield than modified variable 
threshold logic (MVTL) circuits [7], [8] in the clock frequency 
range 5-1 0 GHz. Note that while we specifically apply Monte 
Carlo optimization to voltage-state logic, the optimization 
technique is also applicable to rapid single flux quantum 
(RSFQ) circuits [9]-[ll]. 

We review the basic COSL gates in the following section. 
Section III describes experimental testing results to determine 
3<r local process variations in critical current and resistance, 
and the basic Monte Carlo optimization method is described. 
Two examples are given in Section IV: 2-bit and 3 -bit en- 
coders for fully parallel flash ADC's. The 2-bit encoder 
is compared to a similar MVTL encoder, and yields from 
Monte Carlo calculations are given for various process spreads. 
Design rules, circuit layouts, and experimental test results 
are described in Section V, and a summary and conclusion 
are given in Section VI. The HSPICE model used for the 
simulations is listed in the Appendix. 

II. Review of COSL Gates 

We first briefly review the basic ideas of the COSL family 
[2], [3]. Fig. 1(a) shows the OR/AND gate, and Fig. 1(b) the 
NOR/NAND gate. The XOR function is derived from the OR 
gate by including a 300-^A Josephson junction in series with 
the inputs, Fig. 1(a). All of the gates consist of a one-junction 
SQUID input stage and a two-junction SQUID output stage 
[12]. The two-junction SQUID in the output stage is connected 
in series with a Josephson junction. The COSL circuits are 
designed to use a three-phase sinusoidal clocking scheme, and 
the input and output stages of the gates use two of the clock 
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Fig. 1. Schematic diagrams of: (a) COSL OR/AND gate and (b) NOR/NAND gate. 



phases applied through the clock shaping junctions. These 
junctions have the effect of clamping the SQUID biases at 2.5 
mV when the clocks are applied, independent of the process 
variations. 

The operation of the OR gate is understood intuitively as 
follows. When clock 1 is applied, an input to the gate greater 



than 60 f.cA is sufficient to fire the one-junction SQUID. 
Switching the one-junction SQUID causes a relatively large 
current to flow in the inductor, which is coupled to the output 
two-junction SQUID loop. The one-junction SQUID current 
suppresses the critical current of the two-junction SQUID so 
that when clock 2 is applied, the two-junction SQUID will 
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switch, giving 1 mV at the output, which produces 200 /iA 
in a 5-Q load. 

Note that there is a Josephson junction in series with the 
two-junction SQUID. Parameters are chosen such that the 
critical current of the series junction is less than the unmod- 
ulated two-junction SQUID and greater than the modulated 
two-junction SQUID. Therefore, when there is an input so 
the two-junction SQUID is fired, the SQUID becomes an 
effective high impedance and the series junction cannot fire. 
Conversely, if there is no input to the gate when clock 2 is 
applied, the series junction switches before the two-junction 
SQUID. Once it has switched, the series junction becomes an 
effective high impedance, preventing the two-junction SQUID 
from switching; in this case there is no output from the gate. 
The gate is termed complementary since for high output the 
two-junction SQUID is switched and not the series junction, 
and vice versa. 

The NOR gate in Fig. 1(b) is similar to the OR gate with 
the exception that the position of the output series junction 
and the two-junction SQUID are interchanged. Therefore, the 
one-junction SQUID firing has the opposite effect. When 
clock 2 is applied, and the critical current of the two-junction 
SQUID is suppressed due to the one-junction SQUID input, 
the two-junction SQUID switches. The two-junction SQUID 
becomes an effective high impedance so that there is not 
enough current to activate the series junction; the output from 
the gate is clamped at zero. Conversely, if there is no input 
to the gate, since the critical current of the unbiased two- 
junction SQUID is greater than that of the series junction, 
the latter switches before the SQUID and prevents it from 
switching. This corresponds to a high output of 1 mV into 
a-5 ft load. 

The AND/NAND gates are derived from the OR/NOR gates 
by changing the resistance Rf, of the clock supply for the one- 
junction SQUID. For the OR/NOR gates R b = 8.8 ft and for 
the AND/NAND gates = 14.3 H. Increasing R% reduces 
the amount of current supplied by clock 1 so that two inputs are 
required to switch the one-junction SQUID input stage. The 
gates in Fig. 1 were designed so that there is a fan-out of two, 
where the output from each gate is typically 200 /xA. Larger 
fan-out is possible but would require different parameters in 
the output stage. 

The XOR function is derived from the OR/AND gate by 
placing a 3 OO-^tA junction in series with the input to the one- 
junction SQUID and setting R* = 9.8 ft. The XOR requires 
that two OR/AND gates directly drive the input so that there 
is a fan-in of one. A single gate input of 200 f.tA will not 
switch the 300-/tA XOR junction and, similar to the OR gate, 
the one-junction SQUID will switch, leading to a high output 
from the gate. However, if two inputs are simultaneously input, 
then the combined 400 /iA will fire the XOR junction. This, 
in turn, will reduce the input current so that, when clock 1 is 
applied, there will not be enough current input to activate the 
one-junction SQUID. Note that for correct operation, the XOR 
requires different clock phases than the other COSL gates. The 
optimal clocking scheme for the COSL circuits is described 
in Sections IV and V. 



III. Process Variations and Monte Carlo Optimization 

Process variations are important factors in the design of 
superconducting circuits. The fabrication process for super- 
conducting circuits includes many factors that contribute to 
variations in parameter values. Since circuits are designed 
for specific nominal parameter values, variations in individual 
parameters can prevent them from working correctly. In the 
present work we focus on the variation of parameters which 
are typically used in the design and simulation of digital 
superconducting circuits; these are resistance, critical current, 
and inductance. We anticipate that variations in these three 
parameters will have the most significant effect on circuit 
operation. Other variable factors, such as leakage current, 
are neglected. Variations in resistance, critical current, and 
inductance may result from the contribution of many factors 
including photolithography variations, point defects, and film 
deposition inhomogeneity [5], [6]. We categorize process 
variations into two main groups: global and local. 

Global variations are the average differences in a parameter 
between chips. For example, if a process targets 1 ft/D sheet 
resistance, and the average sheet resistance measured from 
several resistors distributed across a chip is l.i ft/CH, then 
there is a 10% global deviation in that chip. The three impor- 
tant parameters (resistance, critical current, and inductance) 
will all have independent chip-to-chip global variations. Also 
note that different chips will have different average deviations, 
whether from the same wafer or different wafers. However, 
in the present work we approximate the global chip-to-chip 
parameter variations for all chips from the same wafer, from 
measurements of the average global parameter variations of 
the wafer. 

Local variations are those between components in the same 
chip or circuit and are in addition to global variations. As 
an example, consider a chip with a critical current density 
targeting 1 kA/cm 2 that has a measured global critical current 
density of 1100 A/cm 2 . If a single junction on a chip having 
a 200-^iA nominal critical current value is measured to have a 
critical current of 230 ^A, then it has a local variation of 5% 
in addition to a 10% global variation. 

In order to relate the Monte Carlo simulations to exper- 
iments, we analyze the process variations in the HYPRES 
1 kA/cm 2 fabrication process [13]. HYPRES measures the 
average critical current density for 12 junctions ranging from 3 
x 3 fim 2 to 8 x 8 fim 2 in size on each wafer, and resistance 
values are obtained from the average of two 10 x 50 ^m 2 
resistors on each wafer [14]. These average critical current 
density and resistance values are reported with the final 
fabricated chips. HYPRES therefore gives the average global 
variation for each wafer. 

The HYPRES design rule specification for global critical 
current density J c is 30-5000 A/cm 2 ±15% and resistance R is 
1 ft/Ddb20% [14]. Chips with measured J c and R within these 
design rule specifications are said to be within-specification, or 
"in-spec." HYPRES ships at least one in-spec chip along with 
a data sheet for that chip for each foundry run. As a favor to 
its customers HYPRES may also, at its discretion, ship other 
chips and specification sheets which may or may not meet the 



JEFFERY el at.: MONTE CARLO OPTIMIZATION 



107 




7.3 7.8 8.3 8.8 9.3 9.8 10.3 10.8 11.3 11.8 12.3 12.8 13.3 13.8 

Critical current density (1 0 2 A/cm 2 ) 

(a) 

71" 




0,79 0.81 0.84 0.86 0.89 0.91 0.94 0.96 0.99 1.01 1.04 1.06 1.09 1.11 1.14 1.16 



Sheet resistance ( 12/sq) 

(b) 

Fig. 2. Measured process variations in: (a) global critical current density and (b) global sheet resistance. 



foundry specification. According to HYPRES, the additional 
chips or data on the corresponding specification sheets should 
not be construed to be indicative of the process variations for 
qualified in-spec chips [15]. 

Fig. 2(a) and (b) shows bar graphs of the tabulated HYPRES 
global variation data in resistance and critical current density. 
These data are compiled from the measured values reported by 



HYPRES for chips purchased by the University of Rochester; 
average measurements of resistance for 95 wafers and average 
critical current density measurements for 106 wafers are given 
in the bar graphs. These data include both in-spec and out- 
of-spec chips shipped by HYPRES. The target resistance is 
1 O/D and critical current density is 1000 A/cm 2 . The dashed 
lines in Fig. 2(a) signify the range of in-spec chips defined 
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Fig. 2. (Continued) Measured process variations in (c) local variations in resistance and (d) local variations in critical current density. The dashed lines in 
(a) signify the range of qualified chips within the ./,. design rule, and all nonzero data points in (b) are within specification. The local variation data (c) and 
(d) includes only in-specifi cation chips. (Global data values (a) and (b) courtesy of M. Feldman and D. K. Brock.) 



by the HYPRES J c design rule. All of the resistance data 
in Fig. 2(b) are within the HYPRES resistance specification. 
From these data the standard deviation of resistance is o = 
7.8% with average 0.953 ft/Q For critical current density 
a = 12.5% and the average is 1038 A/cm 2 . 

In an attempt to quantify local variations, we have de- 
signed chips with identical resistors and Josephson junctions 
distributed approximately uniformly across the chip. Five 



resistance and five Josephson junction chips were fabricated 
by HYPRES, and all five chips received by the University 
of California, Berkeley, were within the HYPRES J c and 
R specification. Each resistance chip contained 13 nominal 
5 -ft resistors, and the Josephson junction chips contained 
"15 nominal 200-/tA junctions. Each Josephson junction was 
surrounded by a 45 x 81 /«n 2 moat to reduce flux trapping 
[4], and four-point measurements were made on all resistors 
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and Josephson junctions. Each of the five chips were from 
different wafers, and the individual component measurements 
were combined to obtain good statistics. The relative local 
deviations in resistance AR and critical current Ai c for each 
component were calculated from 



Wafers 



Ai c = 



and AR = 



(R) 



(1) 



where i c is the critical current of the individual junction 
and (i c ) is the average critical current of the components 
on a single chip. Similarly, R is the individual component 
resistance and (R) is the average resistance of all resistors on 
a chip. We therefore calculated the local variation for each 
component on a chip relative to the measured average value. 

Fig. 2(c) and (d) shows the distribution of local resistance 
and critical current variations. The standard deviation of the 
resistance measurements is a = 0.82%, and the local critical 
current variations are more broadly distributed with a = 3.7%. 
The average measured Josephson junction critical current was 
186 jtA, and the average resistance was 4.6 Q. Note that these 
data sets are small, and therefore our Gaussian statistics must 
be considered an approximation. Furthermore, since the local 
variation data are taken from only one HYPRES run, these 
data are not necessarily indicative of the general HYPRES 
process. However, for the simulations described in this paper, 
we will use these experimental data as an approximation to 
the local process variations. 

In the present work we do not explicitly analyze measured 
inductance variations. Gaj and coworkers at the University 
of Rochester have estimated that global inductance deviations 
in the HYPRES process have an 8.5% 3a [16]. This result 
is an approximation obtained from numerical simulations 
using estimates of the process deviations in layer thickness 
and spacing. Polonsky at the State University of New York 
has measured global inductance variations in the HYPRES 
process, and he found that the deviations are within the 
HYPRES specifications on metal and insulator thickness vari- 
ations. He reports on-chip, or local, variations well within 
5% [17]. In the present work we approximate the worst case 
global and local inductance deviations by 15% 3a and 5% 
3<j, respectively. 

The local variation data in Fig. 2 has a 3a = 11% and 
3a = 2.5% for critical current and resistance, respectively. 
Similarly, the standard deviations calculated from the global 
variation data in Fig. 2 give 3a — 37% for critical current 
and 3a = 23% for resistance. However, these global varia- 
tion statistics do not accurately model the HYPRES process 
because HYPRES selects qualified, or in-specification, chips 
and therefore cuts off all variations greater than ±15% J c and 
±20% R. In the present work, when we calculate theoretical 
yields with the "measured" statistics, we use the 3a values 
calculated from the total global data in Fig. 2. Our simulation 
results with the measured statistics are therefore conservative 
and do not accurately describe qualified chips selected by 
HYPRES, Specifically for qualified chips with global varia- 
tions within the HYPRES specification, the total parameter 
deviations including local variations can be as much as ±26% 
for J c and ±22% for R. For these approximate data, we see 




Process Parameter Value 



Fig. 3. Schematic of global and local process variations used in Monte 
Carlo optimization. Each chip has a global deviation; we approximate the 
chip-to-chip global variations using the wafer- to- wafer Gaussian distributions 
[Fig. 2(a) and (b)]. In addition to global variations, components fabricated 
on the same chip have different local variations which are also Gaussian 
distributed. These local variations are in addition to the global variations, 
shown schematically for the wafer C. The process deviations on a single chip 
are therefore described statistically by the multiplication of the global and 
local Gaussian distribution functions. 



that the actual parameter values of many individual junctions 
and resistors can vary significantly from their nominal values. 

We have therefore designed the COSL circuits specifically 
including process variations. The simulations include 
both local and global variations, shown schematically in 
Fig. 3. In the simulations each nominal value of resistance, 
critical current density, and inductance {R,J C and L) is 
multiplied by Gaussian-distributed random numbers. For 
global variations all resistors, junction critical currents, 
and inductances are multiplied by different random factors. 
The R, J c , and L for an entire circuit are therefore shifted 
by different Gaussian distributed global variations. Each 
component on a chip also has local variations, and these 
variations are in addition to the global variations. We model 
the local variations by a second Gaussian superimposed on 
the global variation distribution, Fig. 3. 

A Monte Carlo simulation run including both global and 
local variations is implemented in HSPICE [18] as follows. At 
the beginning of each iteration all R, J c , and L values are mul- 
tiplied by their respective uniform deviations from the nominal 
values. As a second step, local variations are included by 
multiplying each individual component by different Gaussian 
distributed random numbers. The individual component values 
are therefore varied at random around the global parameter 
values. Specifically for the Josephson junctions, the local 
variation in critical current is included by multiplying the area 
A of the junction by a Gaussian-distributed random number 
where a is the local variation in critical current. We are thus 
making the assumption that the local variations in junction 
critical current are entirely the result of area variations. The 
Josephson model used in the HSPICE simulations is listed in 
the Appendix. 
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The simulation is run for several clock cycles, and the output 
is measured to check that all possible combinations of high 
and low values are correct. We input artificial data chosen 
specifically to test all possible digital outputs of the logic 
block. Therefore, all possible combinations of output bits are 
measured and a misfire on any single bit is counted as a failure 
for the entire circuit. The process is then iterated, of order 
50 times, with different global and local random numbers to 
generate good statistics on the theoretical yield of the circuits. 
A minimum of 30 simulation runs is considered necessary to 
obtain good statistics [19], however 50-100 simulation runs 
are preferred. The theoretical yield of a circuit is just the 
number of times all output bits are correct divided by the total 
number of trials. This is the probability that, for a chip selected 
at random, the circuit on the chip will operate correctly. 

We use Monte Carlo calculations to optimize the cir- 
cuits. The optimization takes place at the gate level. First 
we use JSPICE3 [20] for a standard two-parameter failure 
analysis; by varying two parameters, the simulator plots the 
working parameter region. If the device consisted of only 
two parameters, then the center of the phase space would 
be the optimal operating point. However, real circuits de- 
pend upon more than two parameters so the phase space 
can change significantly if other parameters are also varied. 
For multiparameter circuits the optimal operating point is 
at the center of several intersecting hyperspheres. This is a 
multidimensional problem and it is challenging to calculate 
for general circuits [21]. Two-parameter analysis, while not 
applicable for general optimization, gives some evaluation of 
the sensitivity of each parameter. 

We optimize the circuit by varying nominal values and doing 
Monte Carlo simulations. By repeatedly varying the nominal 
values of the individual components and rurming Monte Carlo 
simulations one can significantly increase the circuit yields. 
Once the basic gate is optimized, it is implemented in a 
larger circuit. We find that basic gates often have a high 
theoretical yield after optimization, but when the gates are 
implemented in larger circuits the yield of the entire circuit 
is significantly lower. We therefore also vary the resistor 
networks connecting the gates and implement different logic 
configurations (for the same logic function) in the large 
circuits. By experimenting with different logic configurations 
and gate designs, and running many Monte Carlo simulations, 
we are able to converge on an overall circuit design with a 
satisfactory theoretical yield. 

In order to see the effect of improving the process, we have 
made use of some artificially constructed spread data. Using 
this artificial data one can clearly see the effects of different 
types and amounts of parameter variations. The artificial data 
has 3<j spreads of 10% for global L and J c and 15% 3cr for R. 
We did two sets of simulations with these artificial variations; 
one set of simulations assumed 5% 3<x local variations, and the 
second set assumed 10% 3a local variations. For some of the 
circuits we also calculated theoretical yields with zero global 
variations. Finally, we simulated circuits with the measured 
variations of Fig. 2 in an attempt to apply the theory to 
real applications. 




(c) 

Fig. 4. 2-bit ADC encoder logic, (a) OR/AND implementation and (b) 
XOR/OR implementation, (c) The COSL gate layout including buffers corre- 
sponding to the logic in (b). 



The result of the optimization process is that, even with large 
parameter spreads, the gates have extremely high theoretical 
yields. For example, we calculated gate yields with 10-GHz 
clocks and the artificial 3<r spreads. With 3<r local variations 
of 10% for all components, in 50 Monte Carlo cycles, the 
OR gate has a yield of 94% and the AND/XOR, 100%. The 
NOR/NAND gates have slightly lower yield at the gate level, 
with 86% in 50 Monte Carlo cycles. A detailed comparison 
of the basic COSL gate theoretical yield compared with the 
yield of MVTL gates is described in [3], 

In the simulations we also include trimming to cancel 
the global variations. To a certain extent it is possible to 
compensate for the global J c and R variations by applying 
common dc bias currents to the input and output stages of all 
gates. This trimming process is described in detail elsewhere 
[3]. Trimming has been specifically included in the Monte 
Carlo simulations with the result that circuit yields can be 
further increased. At the basic gate level, with trimming, the 
yield of all the single gates is approximately 100% in 50 Monte 
Carlo cycles with the last mentioned parameter spreads. 

In the following section we describe simulation results and 
circuit design considerations for optimal COSL 2- and 3-bit 
encoder circuits for a flash ADC. The logic is compared in 
various configurations and with different process spreads. We 
also show the optimal clocking scheme and gate configurations 
for circuit layouts with high theoretical yields. 

IV. Monte Carlo Optimization of COSL Circuits, 
2- and 3-Brr Encoder Circuits for Flash ADC 

As two examples we will describe 2- and 3 -bit encoder cir- 
cuits for a flash ADC implemented using COSL and optimized 
using the Monte Carlo method. The flash ADC consists of a 
parallel bank of comparators with a logic block to encode the 
comparator thermal code into binary output bits [22], [23]. 
For a 2-bit encoder, there are three inputs and zero which are 
encoded onto two binary bits. 
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Fig. 5. 2-bit encoder simulated at 10 Gb/s. The input, shown on top, has been included to demonstrate the full flash ADC. 



If A t B and C are the three low-to-high nonzero comparator 
inputs, then the two binary outputs Xo and Xi are given by the 
following Boolean expressions: 



X X =B 

Xo= AB-fC — A ® B + C 



(2) 
(3) 



where ® denotes the XOR function. The latter expression 
in (3) is true due to redundancies in the Karnaugh map 
[24] with thermometer code inputs. Fig. 4(a) and (b) shows 
two functionally equivalent schematic diagrams of the logic 
functions (2) and (3). MVTL is easiest to implement using the 
logic in Fig. 4(a) whereas COSL gives optimal yields using 
the logic in Fig. 4(b). The full gate layout of the COSL 2- 
bit encoder including the clock phases is shown in Fig. 4(c). 
The gates operate using three phase sinusoidal clocks, and the 
two-clock phases for each gate are labeled in the figure. These 
clocks (denoted by 1-3) are all 10 mV in amplitude and differ 
in phase by 120°. Note that even though there are only two 
gates in the actual logic function, nine gates are necessary for 
correct phasing of the data and clocks. The additional gates 
are OR and XOR buffers. 

Simulation results for the 2-bit encoder in Fig. 4(c) are 
shown in Fig. 5 with 10-GHz sinusoidal three-phase clocks. 
The simulation assumes a ramped low-frequency input and 
the encoder counts the number of high comparator levels. For 



example, when only one comparator is switched the output is 
binary 01, two comparator inputs give 10 output, and three 
high inputs correspond to binary 1 1 output. 

For flash ADC applications, the next level of complexity is 
a 3 -bit encoder circuit. In this case there are seven comparator 
inputs, low-to-high values denoted by (A,B,C,D,E,F,G), and 
zero which are encoded onto three binary bits (X 0 , Xi , X 2 ). The 
corresponding logic functions are 



X 2 =D 

X 1= DB-hDF 



=D+B+ D+F 



Xo = D(A 0 B + C) + D(D 0 E + F) + G 



= D+A® B + C + D+D0E + F+G. 



(4) 
(5) 
(6) 
(7) 
(8) 



The expressions (6) and (8) in Xi and Xo result from De- 
Morgan's theorem [24]. The logic functions (5) and (7) are 
easiest to implement using COSL OR/AND/XOR logic gates, 
whereas (6) and (8) are applicable to OR/NOR/XOR gate 
implementations. 

Fig. 6(a) is the block diagram of the 3-bit encoder logic 
implemented using OR, AND, and XOR gates. Note that 
in this case we implement the inversion function using an 
XOR gate with one of the inputs pulsed constantly high. The 
pulser is just an OR gate with no inputs and R b = 6 ft. 



112 



IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, VOL. 8, NO. 3, SEPTEMBER 1998 



D(A@B + C) + D(D©E + F) + G 




(a) 



12 2 3 




12 23 



12 2 3 



12 2 3 12 

1 2 




12 2 3 1 P J_J 



Q-D 6 $ — £> 



(b) 

Fig. 6. 3-bit encoder ADC logic, (a) XOR/OR/AND logic implementation and (b) COSL gate layout with three-phase clock including buffers. 



The complete gate layout for the 3-bit encoder logic is shown 
schematically in Fig. 6(b), including the clock phasing labeled 
(1-3) on top of each gate. Simulation results for the 3 -bit 
encoder in Fig. 6(b) are given in Fig. 7. The thermometer code 
in Fig. 7(a) corresponds to a ramped input, and the output is 
the input encoded on the three binary bits. 

We simulated many possible configurations of 2- and 3- 
bit encoders (not shown). The gate configurations in Fig. 4(c) 



and Fig. 6(b) are the final result of the optimization process. 
For optimal yield we found that the gates require a special 
clocking scheme. Specifically, the inputs of all OR/AND gates 
have the same clock as the preceding gate output. This is 
shown schematically in Fig. 4(c) and Fig. 6(b). However, due 
to the novel operation of the XOR gate, described previously 
in Section II, the XOR input must have a clock phase different 
from the previous gates' outputs. As an example, see Fig. 4(c). 
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Fig. 7. 3-bit encoder simulation results at 10 Gb/s: (a) inputs and (b) three-output bits. 



1.0 



Note in Fig. 4(c) that the last two OR gates have the same 
clock I input as the XOR gates' outputs, and that the XOR 
gates have an input clock 3 which follows the OR gate buffer 
clock 2 output. Furthermore, to eliminate switching errors for 
the one-junction SQUID input stage, when connecting gates 
we use 10-fi resistive matching networks (see Fig. 1) to reduce 
the current input to the gate when there is a fan- in of two. A 



5-H series resistor is used for direct coupling, fan-in of one, 
of all logic gates. 

We have simulated both the 2- and 3 -bit encoder circuits 
repeatedly using several gate configurations and different 
global and local process parameter spreads. Fig. 8 shows the 
final Monte Carlo results for the 2-bit encoder with 5- and 
10-GHz sinusoidal clocks and the 3-bit encoder with 10-GHz 
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Fig. 8. (a) Simulated theoretical yields for the 2-bit encoder operating at 
5 Gb/s. In the plot the first two data sets are for artificial global variations 
and 5 and 10% 3rr local process variations. The third set of data is theoretical 
yield calculated using the measured process variations, (b) The same as (a) 
except the clocks are 10 GHz. (c) 3-bit encoder Monte Carlo simulation results 
at 10 GHz with artificial global variations and 5 and 10% 3t local process 
parameter spreads and measured parameter spreads. Yield results are also 
shown for 5 and 10% local variations with zero global variations. 

clocks. The 2-bit encoder simulations are for 100 Monte Carlo 
cycles, and the 3 -bit simulations are for 50 Monte Carlo cycles. 
In these plots the first two sets of data bars are for artificial 
global variations of 3a = 15% for R and L and 3a = 10% 
for J Ci the first set of data bars have local variations of 5% 
3a on all parameters, and the second set of data bars have 
10% 3a local variations on all parameters. The third set of 
data bars are the results of simulations with the measured 



variations (Fig. 2); global variation 3a = 37% for J c and 
3a = 23% for R, and local variations have a 3a = 11% for 
J c and 3a = 2.5% for R. For the simulations with measured 
variations we approximate the inductance variation as 15% 
3a globally, and 5% 3a locally. 

Note that all Monte Carlo simulation results have a statisti- 
cal uncertainty which is a function of the number of trials and 
the calculated yield. The yield accuracy analysis is described 
in detail elsewhere [3], [19]. However, without going into 
the details of this analysis, it is useful to put the accuracy 
of Monte Carlo simulation results in perspective. A yield of 
92% will have uncertainties of ±7.7% for 50 MC cycles, 
±5.4% for 100 MC cycles, and ±3.8% for 200 MC cycles. 
A yield of 66%, on the other hand, will give uncertainties of 
±13.5% for 50 MC cycles, ±9.5% for 100 MC cycles, and 
±6.7% for 200 MC cycles. Therefore, simulations with low 
yields have larger uncertainties, and increasing the number 
of Monte Carlo cycles decreases the statistical uncertainty. 
The results of Fig. 8 are therefore not "exact" but describe the 
approximate theoretical yields of the circuits within a statistical 
error range. For clarity in Fig. 8 we have included error bars 
for the statistical uncertainty of each simulation result. 

The results of the 2-bit encoder for 100 Monte Carlo cycles 
are given with and without dc bias trimming, and for an 
equivalent MVTL encoder. The MVTL gates used in the 
encoder were modified to give high yield at 10 GHz [3]. 
The yield of the COSL 2-bit encoder is dependent upon the 
local variations; at 10 GHz the yield is 87%±6.7% with 
5% local variations, and this drops to 78%±8.3% with 10% 
local variations. However, the addition of dc bias trimming 
increases the yield to approximately 90%±5.4% in both cases. 
We also calculated the theoretical yield of the 2-bit encoder 
implemented using MVTL gates. At 10 GHz with 5% 3a local 
variations the MVTL yield was 77%±8.4%, and with 10% 
local variation the yield drops to approximately 69%±9.2. 

From the Monte Carlo simulations we found that even 
though the NOR/NAND inversion functions have a high 
theoretical yield individually, when the NOR/NAND gates are 
included in large circuits the yield of the system is significantly 
less than expected. At 10 GHz with 5% 3a local variation 
the 3-bit encoder OR, NOR, and XOR implementation had a 
yield of approximately 45%±15% in 50 Monte Carlo cycles 
(not shown in Fig. 8). However, the XOR function, which is 
almost identical to the OR/AND gate, has the same theoretical 
yield as the OR/ AND gates. The theoretical yield of the 3- 
bit encoder, simulated for 50 Monte Carlo cycles, is shown 
in Fig. 8(c) using the XOR inversion architecture shown in 
Fig. 7(c). The theoretical yield with 5% local 3a variations is 
80%±1L3% without trimming. When the local variations are 
increased to 10% 3a the yield is 62%±13.7%. We therefore 
chose to implement an inversion function using the COSL 
XOR gate to give maximum yields for large circuits. 

Note from Fig. 8(c) that, as expected, the yield of the 3 -bit 
encoder decreases as the local variations are increased. Also, 
comparing 2- and 3 -bit encoder circuits, large COSL circuits 
are more sensitive to the local variations than are small COSL 
circuits. Fig. 8(c) also shows the 3 -bit encoder yield with zero 
global variations and no trimming; with 5% local variation the 
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(a) 



Fig. 9. (a) 5 mm x 5 mm 2-bit encoder chip fabricated using the HYPRES process. 

reduced the uncertainties. However, note that at 5 and 10 GHz 
the COSL 2-bit encoder yield with trimming is better, in a strict 
statistical sense, than MVTL for all variations. Furthermore, 
at 10 GHz with measured variations the COSL yield without 
trimming is also significantly better, in a strict statistical sense, 
than MVTL. These data therefore clearly demonstrate that 
COSL gates have a higher probability of operating successfully 
in the frequency range of 5-10 GHz than MVTL. 

The important point is that these simulations show that 
the fabrication process plays a significant role in successfully 
demonstrating working circuits. The Monte Carlo method 
enables one to evaluate the expected yield of a circuit with 
many different nominal parameters. Circuits are optimized by 
choosing parameters which maximize the theoretical yield not 
only of the component gates, but also of the entire circuit. 
The circuit yield therefore acts as a pretest evaluation of how 
well the circuit has been optimized and correlates directly with 
the probability of fabricating working circuits. Furthermore, 
the results in Fig. 8(c) with no global variations demonstrate 
that if the global variations of the parameter spreads can be 



theoretical yield is 100% and with 10% local variation, 96% 
(—5.4%, 4-4%) in 50 Monte Carlo cycles. This is an important 
result, and it shows that decreasing the global variations is the 
most significant factor for increasing the theoretical yield. 

Finally, we have calculated the 2- and 3 -bit encoder 
yields using the measured process variations. At 10 GHz, 
the COSL 2-bit encoder yield was 56%±9.5%, and the 3-bit 
was 36%±13.6%. The 2-bit MVTL decoder had a yield 
of 35%±9.5%. With trimming the yield of the COSL 2- 
bit encoder increased to 66%±9.5%. Clearly, large global 
variations have a significant effect on the theoretical yield of 
the circuits. 

Some of the uncertainty ranges, given by error bars in 
Fig. 8, overlap, and from a statistical standpoint it is not 
possible to draw firm conclusions comparing these specific 
cases. However, the data overlap is typically small occurring 
at the top and bottom of adjacent data uncertainty intervals 
so that one can see the trend in these data. Excessively long 
simulation tifries prevented us from calculating the theoretical 
yields for more Monte Carlo cycles, which would have in turn 
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Fig. 9. {Continued) (b) Expanded view of the 2-bit encoder (the area in the photograph is 13 x 1.5 mm'*), (c) COSL XOR gate in the encoder 
layout (the area is 233 x 245 /tm-). 



minimized, possibly by choosing specific wafers [25], then 
the theoretical yield of complex superconducting circuits dra- 
matically increases. Of course, decreasing the local parameter 
spreads also increases the yield of superconducting circuits. 

V. Circuit Layout and Experimental Test Results 

We had COSL gates and 2 -bit encoder circuits fabricated 
using the 1 kA/cm 2 HYPRES process. A photograph of the 
COSL 2-bit encoder 5 mm x 5 -mm chip is shown in Fig. 9(a). 
The chip contains two 2-bit encoders. An expanded view of 
a single encoder is shown in Fig. 9(b), and one of the XOR 



gates in Fig. 9(c). In Fig. 9(b) the three phase clocks are- at the 
top, the three thermometer code inputs are on the left, the two 
binary outputs are on the right, and dc bias lines are on the 
bottom. The gate outputs are amplified to 2.5 mV for detection 
off-chip using single-junction output amplifiers shown on the 
right in Fig. 9(b). 

Note that the Josephson junctions in the gate Fig. 9(c) are 
surrounded by moats, or holes in the ground plane. These holes 
enclose approximate areas of 60 x 70 j«n 2 and should shield 
the circuit for magnetic fields up to $f 0 /A = 5 mG, where $ n 
is the flux quantum k/2e [4]. 
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We have used Monte Carlo simulations to calculate the 
effect of impedance mismatches. We found from these sim- 
ulations that if the clock lines and connections between gates 
are riot impedence-matched, the yield of the circuits at 10 
GHz is reduced. All of the transmission lines are therefore 
impedance-matched for the circuit in Fig. 9. The inputs and 
outputs of the gates are matched to 5 ft. The gate clock lines 
have impedances of 6.7 ft (clock 1) and 13.4 fl (clock 2), 
and these clock lines are combined into a matched tree that 
has a characteristic impedance of 1.6 ft at the pad. Resistive 
matching networks are used for the inputs of the bottom 2-bit 
encoder Fig. 9(a). 

Resistive matching networks are especially useful for broad- 
band testing. However, since the superconductive circuits have 
an intrinsically low impedance, a large resistance is required 
to match to 50 ft. This resistor dissipates excessive power, and 
it is impractical to use resistive matching for the three phase 
clocks. To avoid excessive heating of the chip, we therefore 
mismatch the 1 .6-fl clock lines at the pads in Fig. 9 to the 50- 
fl cables external to the chip. Since the clocks are sinusoids, a 
large reflected component in the coaxial cables does not effect 
the shape of the signal input to the chip. 

Experimental test results are given for the 2-bit encoder 
operating at 1 Gb/s in Fig. 10. The 1-Gb/s inputs are generated 
using an HP 80000 data generator, and the sinusoidal clock is 
generated by an HP 85735 synthesizer signal generator. The 
inputs and outputs are observed on a Tektronix 1 1 801 A digital 
sampling oscilloscope. The chip is mounted on the end of an 
American Cryoprobe high-speed probe [26] surrounded by two 
mumetal shields and immersed in a liquid helium dewar. The 
experimental data in Fig. 10 corresponds to the simulation in 
Fig. 5. The thermometer code inputs are the three top traces 
in Fig. 10, and the two binary output bits are the bottom two 
traces. The input and output data are shifted by approximately 
15 ns; this is the time delay of the signal as it propagates 
in the cables from the chip. The 2.5-mV superconducting 
circuit outputs have been averaged for the photograph Fig. 10. 
However, no averaging is necessary if one uses a low-noise 
amplifier [27]. 

At low frequencies (5 kHz) it is straightforward to apply 
the nominal 10-mV clock amplitudes to the circuit; how- 
ever, at 1 GHz the test setup is more complicated. All room 
temperature electronics has a characteristic impedance of 50 
O. The high-speed probe has 50-fl cables and is impedance 
mismatched to the 1 .6-fl clock transmissions lines at the pads 
of the chip. For high-speed testing the clock amplitudes are 
measured using the 50-fl sampling scope before they are 
connected to the high-speed probe. We found that due to the 
impedance mismatch and loss in the cables, 400-mV clock 
amplitudes are required, measured by the 50-ft scope, to give 
the nominal 10-mV clock amplitudes at the chip. 

Fig. 1 1 shows the same circuit clocked at 4 GHz. In this case 
we use the divide-by-four of an NEL NG4218 multiplexer to 
phase lock the clock with the HP 80000 data generator. The 
inputs in Fig. 11(a) are 1 Gb/s RZ and, since the circuit is 
clocked at 4 Gb/s, there are two pulses output in Fig. 1 1(b) 
for each input pulse. The large oscillation on the background 
is due to imperfect balancing of the three-phase clocks and can 




Fig. 10. 2 -bit encoder operating at 1 Gb/s. The three inputs are shown at 
the top and the two output bits on the bottom. 




Fig. 11. 2-bit encoder clocked at 4 Gb/s with 1 Gb/s inputs, (a) The three 
inputs and (b) the two output bits. 

be reduced significantly by the addition of a fourth clock phase 
[27]. We have thus far demonstrated the full 2-bit encoder at 
5 Gb/s, with portions of the circuit operating up to 8 Gb/s. We 
have also demonstrated the basic COSL OR/ AND gate at 10 
Gb/s and measured bit error rates. These results are described 
in detail elsewhere [2], [27]. 



VI. Summary and Conclusions 

Experimental data for global and local process variations 
have been presented. For Josephson superconducting circuits 
these process variations can significantly reduce the proba- 
bility of obtaining working circuits at ultra-high speed. An 
optimization technique was described such that basic gates 
are optimized using the Monte Carlo method and then incor- 
porated into larger circuits. We used the Monte Carlo method 
to simulate basic gates and complex circuits in order to realize 
logic functions with maximum yield. 
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We specifically optimized COSL gates and circuits for 
5-10 GHz operation using the Monte Carlo method. After 
optimization the basic gates have a very high theoretical yield, 
approaching 100% in 50 Monte Carlo cycles at 10 GHz. With 
large global (>10% 3<r) and local (10% 3a) variations, COSL 
2-bit encoder circuits have yields of 78%±8.3% in 100 Monte 
Carlo cycles, and 3 -bit encoders have yields of 62%±13.7% 
in 50 Monte Carlo cycles. The addition of dc bias trimming to 
cancel global process variations increased the theoretical yield. 
With zero global variations and 10% 3<r local variations, the 
yield of complex 3-bit encoder circuits was 96%±5.5% with 
no dc bias trimming. Compared to similar MVTL circuits, 
COSL logic has a significantly higher theoretical yield at 
10 Gb/s. These results contrast the effects of global and local 
variations and increasing circuit complexity. 

We also presented I- and 4-Gb/s test results on COSL 2- 
bit encoder circuits. We discussed gate layouts, impedance 
matching, and optimal clock phasing. Basic COSL gates have 
been demonstrated at 10 Gb/s, and COSL 2-bit encoder circuits 
at 5-8 Gb/s. These results are described in detail elsewhere [2]. 

Monte Carlo optimization is relatively simple to implement 
and has the advantage that the calculated yields correlate 
directly with the probability of fabricating working circuits. 
Furthermore, our simulation results demonstrate quantitatively 
the effect of improving the process. 



Appendix 

HSPICE Subcircuit for a Josephson Junction 

The RSJ model is used to model the Josephson junction. It 
consists of a parallel connection of a basic Josephson junction, 
a voltage-dependent resistor, and the junction capacitance. The 
total Josephson current is given by 



v dv 

Jishunt & 



(9) 



and the voltage across the junction as 



v = 



2tt ~dt' 



(10) 



The gauge invariant phase difference <p is generated in the 
model by taking (10) as the governing equation of a capacitor. 
The voltage v across the junction is monitored and converted 
to a current with the same magnitude. This current is fed to 
a series capacitor with a capacitance of # 0 /27r. The voltage 
across the capacitor is thus a representation of <p. The initial 
value of tp is easily implemented as the initial value of the 
voltage across the capacitor. 

The magnitude of the critical current and the associated 
junction capacitance, normal resistance, and subgap resistance 
are calculated from an area factor, which is passed to the 
Josephson subcircuit. In the 1-kA/cm 2 HYPRES process, an 
area factor of 1 represents an area of 10 /on 2 and thus a critical 
current I c = 100 /a A. The HSPICE subcircuit description for 



the 1-kA/cm 2 HYPRES process follows: 

.subckt jj 2 4 area=l ij=100u rn— 26 rg=300 

cj=0.4pjc=l vg— 2.6m dlv=0.3m phi=0 

cl 2 4 c= l cj*area' ctype=l 

gr 2 4 vcr pwl(l) 2 4 

+ l _l * vgYrn/area' 

+ '-1 * (vg - dlv)7rg/area' 

+ l vg - dlvYrg/area' 

+ 'vgYrn/area' 

gjos 2 4 cur='ij*area*sin(v(3,4)*l0k) > 

gphi 4 3 cur— l v(2,4)' 

rphi 4 3 lOOOg 

cphi 3 4 3.291 090p ic= t phi > 

.ends jj. 

The capacitor cphi was scaled by a factor of 10000 to 
increase numerical stability and accuracy during simulations. 
This factor is also reflected in the expression for gjos. 

Note that we make no warranties, expressed or implied, that 
the above subroutine is free of errors. The authors disclaim any 
liability for direct or consequential damages resulting from the 
use of this subroutine. 

ACKNOWLEDGMENT 

The authors gratefully acknowledge M. Feldman and 
D. K. Brock from the University of Rochester for providing 
the global process variation data. Dr. Brock has recently 
joined HYPRES. We also would like to thank O. Mukhanov 
of HYPRES and K. Likharev of SUNY for useful discussions. 

References 

[1] S. Hasuo and T. Imamura, "Digital logic circuits," Pmc. IEEE, vol. 77, 

pp. 1177-1193, Aug. 1989. 
[2] M. Jeffery, W. Perold, and T. Van Duzer, "Superconducting comple- 
mentary output switching logic operating at 5-10 Gb/s," Appl. Phys. 

Lett, vol. 69, pp. 2746-2748, Oct 1996. 
[3] W. J. Perold, M. Jeffery, Z. Wang, and T. Van Duzer, "Complementary 

output switching logic — A new voltage-state logic family," IEEE Trans. 

Appl Supercond., vol. 6, pp. 125-131, Sept. 1996. 
[4] M. Jeffery, T. Van Duzer, J. R. Kirtley, and M. B. Ketchen, "Magnetic 

imaging of moat-guarded superconducting electronic circuits," Appl. 

Phys. Lett., vol. 67, pp. 1769-1771, SepL 1995. 
[5] C. Hamilton and K. C. Gilbert, "Margins and yield in single flux 

quantum logic," IEEE Trans. Appl Superconduct., vol. 1, pp. 157-163, 

Dec. 1991. 

[6] A. D. Smith, S. L. Thomasson, and C. Dang, "Reproducibility of 
niobium junction critical currents: Statistical analysis and data," IEEE 
Trans. Appl Superconduct., vol. 3, pp. 2174-2177, Mar. 1994. 

[7] N. Fujimaki, S. Kotani, T. Imamura, and S. Hasuo, "Josephson modified 
variable threshold logic gates for use in ultra-high- speed LSI," IEEE 
Trans. Electron Devices, vol. 36, pp. 433-446, Feb. 1989. 

[8] N. Fujimaki, T. Imamura, and S. Hasuo, "Josephson pseudorandom bit- 
sequence generator," IEEE J. Solid-State Circuits, voU 23, pp. 852-858, 
June 1988. 

[9] K. K. Likharev and V. K. Semenov, "RSFQ logic/memory family: A 
new Josephson junction technology for sub- terahertz-clock-frequency 
digital systems," IEEE Trans. Appl. Superconduct., vol. 1, pp. 3-27, 
199L 

[10] R Yoshikawa, Z. J. Deng, S. R. Whiteley, and T. Van Duzer, "Design 
and testing of data-driven self- timed RSFQ demultiplexer" Extended 
Abstracts 6th Int. Superconducting Electronics Con/., Berlin Germany, 
June 25-28, 1997, pp. 353-355. 

[11] T. Harnisch, J. Kunert, H. Toepfer, H. F. Uhlmann, "Design centering 
methods for yield optimization of cryoelectronic circuits," IEEE Trans. 
Appl Superconduct., vol. 7, pp. 3434-3437, June 1997. 

[12] T. Van Duzer and C. W. Turner, Principles of Superconducting Devices 
and Circuits. New York: Elsevier, 1981, pp. 165-244. 



JEFFERY et at.: MONTE CARLO OPTIMIZATION 



119 



[13] HYPRES Inc. 175 Clearbrook Road, Elmsford, NY 10523 USA; 
design rules are available via the HYPRES home page at 
http://www.hypres.com. 

[14] J. Coughlin, HYPRES Inc., private communication. 

[15] O. Mukhanov, HYPRES Inc., private communication. 

[16] K_ Gaj, Q. P. Herr, and M. J. Feldman, Parameter variations and 
synchronization of RSFQ circuits," Applied Superconductivity, D. Dew- 
Hughes, Ed. Bristol, U.K.: Inst. Physics, 1995, pp. 1733-1736. 

[17] S. Polonsky, SUNY Stony Brook, private communication. 

[18] HSPICE, Meta-Software Inc., 1300 White Oaks Road, Campbell, CA 
95008 USA. 

[19] R. Spence and R. S. Soin, Tolerance Design of Electronic Circuits. 

New York: Addison-Wesley, 1988, pp. 56-87. 
[20] S. R. Whiteley, "Josephson Junctions in Spice3," IEEE Trans Mag., voL 

27, no. 2, pp. 2902-2905, Mar. 1991. 
[21] Q. P. Herr and M. J. Feldman, "Multiparameter optimization of RSFQ 

circuits using the method of inscribed hyperspheres," IEEE Trans. Appl 

Superconduct., vol. 5, pp. 3337-3340, June 1995. 
[22] E. S. Fang, D. Hebert, and T. Van Duzer, "A multi-gigahertz Josephson 

flash A/D converter with a pipelined encoder using large-dynamic-range 

current-latch comparators," IEEE Trans. Mag, vol. 27, pp. 2891- 2894, 

Mar. 1991. 

[23] H. Luong, D. Hebert, and T. Van Duzer, "Fully parallel superconducting 

analog- to-digital converter," IEEE Trans. Appl. Superconduct., vol. 3, 

pp. 2633-2636, Mar. 1993. 
[24] P. Horowitz and W. Hill, The Art of Electronics, 2nd ed New York: 

Cambridge Univ. Press, 1991, pp. 490-494. 
[25] X. Meng, H. Jiang, A. Bhat, and T. Van Duzer, "Precise control 

of critical current and resistance in a Nb/AJO. r Nb integrated circuit 

process ," Extended Abstracts 6th Int. Superconducting Electronics Conf, 

Berlin, Germany, June 25-28, 1997, pp. 164-166. 
[26] D. Petersen, American Cryoprobe, 5323 347th Place, SE Fall City, WA 

98024 USA. 

[27] M. Jeffery, W. Perold, and T. Van Duzer, "Experimental demonstration 
of complementary output switching logic approaching 10 Gb/s clock 
frequencies," IEEE Trans. Appl. Superconduct., vol. 7, pp. 2665-2668, 
June 1997. 



Mark Jeffery (M'94) received the Ph.D. degree in 
physics from Drexel University, Philadelphia, PA, 
in 1991. 

He spent two years as an NSF/STA Fellow at the 
Goto Laboratory RIKEN in Japan and is presently 
working as an Assistant Research Electrical Engi- 
neer in the cryoelectronics group at the University of 
California, Berkeley. His research interests include 
high-speed testing and low- and high-Tc supercon- 
ducting devices and systems. 





WUlem J. Perold (M'87) graduated from the Uni- 
versity of Stellenbosch, South Africa, in 1976. He 
received the Masters and Ph£>. degrees from the 
same university in 1977 and 1986, respectively. 

He has been with the Department of Electrical and 
Electronic Engineering, University of Stellenbosch, 
since 1982. His research interests include solid state 
physics, computer simulation, and superconducting 
electronic circuits. 




Zuoqin Wang was bom in Changchun, China, in 
1957. She received the M.S. degree in electrical 
engineering from Chanchun University of Earth 
Sciences, China, in 1986. She recently received the 
M.E. degree in the cryoelectronics group at the 
University of California, Berkeley. 



Theodore Van Duzer (S 52-M , 60-SM , 75-F'77- 
LF'93) received the Ph.D. degree in 1960 from the 
University of California, Berkeley. 

He has been on the faculty of the Electrical 
Engineering and Computer Sciences the University 
of California, Berkeley, since 1961. He has coau- 
thored two textbooks, Principles of Superconduc- 
tive Devices and Circuits and Fields and Waves in 
Communication Electronics , and has published ex- 
tensively in the field of superconductive and hybrid 
superconductor-semiconductor devices and systems. 
His research interests include superconductive devices and digital circuits, and 
hybrids. 

Dr. Van Duzer led the establishment of IEEE Transactions on Applied 
Superconductivity and served as its first Editor-in-Chief. He is active in 
the leadership of conferences in the field. He is a member of the National 
Academy of Engineering and was awarded the Berkeley Citation in 1993. 




This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 



□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




LINES OR MARKS ON ORIGINAL DOCUMENT 



